Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis
نویسندگان
چکیده
This work aims to improve text-to-speech synthesis for Wikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.
منابع مشابه
A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data
In this paper, we evaluate a framework of statistical parametric speech synthesis based on Gaussian process regression (GPR) and compare it with those based on hidden Markov model (HMM) and deep neural network (DNN). Recently, for the purpose of improving the performance of HMM-based speech synthesis, novel frameworks using deep architectures have been proposed and have shown their effectivenes...
متن کاملModel Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework
In this paper, we propose a model integration method for hidden Markov model (HMM) and deep neural network (DNN) based acoustic models using a product-of-experts (PoE) framework in statistical parametric speech synthesis. In speech parameter generation, DNN predicts a mean vector of the probability density function of speech parameters frame by frame while keeping its covariance matrix constant...
متن کاملOn the impact of phoneme alignment in DNN-based speech synthesis
Recently, deep neural networks (DNNs) have significantly improved the performance of acoustic modeling in statistical parametric speech synthesis (SPSS). However, in current implementations, when training a DNN-based speech synthesis system, phonetic transcripts are required to be aligned with the corresponding speech frames to obtain the phonetic segmentation, called phoneme alignment. Such an...
متن کاملMultiple feed-forward deep neural networks for statistical parametric speech synthesis
In this paper, we investigate a combination of several feedforward deep neural networks (DNNs) for a high-quality statistical parametric speech synthesis system. Recently, DNNs have significantly improved the performance of essential components in the statistical parametric speech synthesis, e.g. spectral feature extraction, acoustic modeling and spectral post-filter. In this paper our proposed...
متن کاملDeep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model
This paper proposes a deep neural network (DNN)-based statistical parametric speech synthesis system using an improved time-frequency trajectory excitation (ITFTE) model. The ITFTE model, which efficiently reduces the parametric redundancy of a TFTE model, improved the perceptual quality of the vocoding process and the estimation accuracy of the training process. However, there remain problems ...
متن کامل